Search CORE

189 research outputs found

Data Science and Big Data in Energy Forecasting

Author: Martínez Álvarez Francisco
Riquelme Santos José Cristóbal
Troncoso Lora Alicia
Publication venue: 'MDPI AG'
Publication date: 01/11/2018
Field of study

This editorial summarizes the performance of the special issue entitled Data Science and Big Data in Energy Forecasting, which was published at MDPI’s Energies journal. The special issue took place in 2017 and accepted a total of 13 papers from 7 different countries. Electrical, solar and wind energy forecasting were the most analyzed topics, introducing new methods with applications of utmost relevance.Ministerio de Competitividad TIN2014-55894-C2-RMinisterio de Competitividad TIN2017-88209-C2-

Directory of Open Access Journals

idUS. Depósito de Investigación Universidad de Sevilla

A Framework for Evaluating Land Use and Land Cover Classification Using Convolutional Neural Networks

Author: Carranza García Manuel
García Gutiérrez Jorge
Riquelme Santos José Cristóbal
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

Analyzing land use and land cover (LULC) using remote sensing (RS) imagery is essential for many environmental and social applications. The increase in availability of RS data has led to the development of new techniques for digital pattern classification. Very recently, deep learning (DL) models have emerged as a powerful solution to approach many machine learning (ML) problems. In particular, convolutional neural networks (CNNs) are currently the state of the art for many image classification tasks. While there exist several promising proposals on the application of CNNs to LULC classification, the validation framework proposed for the comparison of different methods could be improved with the use of a standard validation procedure for ML based on cross-validation and its subsequent statistical analysis. In this paper, we propose a general CNN, with a fixed architecture and parametrization, to achieve high accuracy on LULC classification over RS data from different sources such as radar and hyperspectral. We also present a methodology to perform a rigorous experimental comparison between our proposed DL method and other ML algorithms such as support vector machines, random forests, and k-nearest-neighbors. The analysis carried out demonstrates that the CNN outperforms the rest of techniques, achieving a high level of performance for all the datasets studied, regardless of their different characteristics.Ministerio de Economía y Competitividad TIN2014-55894-C2-1-RMinisterio de Economía y Competitividad TIN2017-88209-C2-2-

Directory of Open Access Journals

idUS. Depósito de Investigación Universidad de Sevilla

Tackling Ant Colony Optimization Meta-Heuristic as Search Method in Feature Subset Selection Based on Correlation or Consistency Measures

Author: Riquelme Santos José Cristóbal
Tallón Ballesteros Antonio Javier
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2014
Field of study

This paper introduces the use of an ant colony optimization (ACO) algorithm, called Ant System, as a search method in two wellknown feature subset selection methods based on correlation or consistency measures such as CFS (Correlation-based Feature Selection) and CNS (Consistency-based Feature Selection). ACO guides the search using a heuristic evaluator. Empirical results on twelve real-world classification problems are reported. Statistical tests have revealed that InfoGain is a very suitable heuristic for CFS or CNS feature subset selection methods with ACO acting as search method. The use of InfoGain is shown to be the significantly better heuristic over a range of classifiers. The results achieved by means of ACO-based feature subset selection with the suitable heuristic evaluator are better for most of the problems comparing with those obtained with CFS or CNS combined with Best First search.MICYT TIN2007-68084- C02-02MICYT TIN2011-28956-C02-02Junta de Andalucía P11-TIC-752

Crossref

idUS. Depósito de Investigación Universidad de Sevilla

Data Cleansing Meets Feature Selection: A Supervised Machine Learning Approach

Author: Riquelme Santos José Cristóbal
Tallón Ballesteros Antonio Javier
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2015
Field of study

This paper presents a novel procedure to apply in a sequential way two data preparation techniques from a different nature such as data cleansing and feature selection. For the former we have experienced with a partial removal of outliers via inter-quartile range whereas for the latter we have chosen relevant attributes with two widespread feature subset selectors like CFS (Correlation-based Feature Selection) and CNS (Consistency-based Feature Selection), which are founded on correlation and consistency measures, respectively. Empirical results on seven difficult binary and multi-class data sets, that is, with a test error rate of at least a 10%, according to accuracy, with C4.5 or 1-nearest neighbour classifiers without any kind of prior data pre-processing are outlined. Non-parametric statistical tests assert that the meeting of the aforementioned two data preparation strategies using a correlation measure for feature selection with C4.5 algorithm is significant better, measured with roc measure, than the single application of the data cleansing approach. Last but not least, a weak and not very powerful learner like PART achieved promising results with the new proposal based on a consistency measure and is able to compete with the best configuration of C4.5. To sum up, bearing in mind the new approach, for roc measure PART classifier with a consistency metric behaves slightly better than C4.5 and a correlation measureMICYT TIN2007-68084-C02- 02MICYT TIN2011-28956-C02-02Junta de Andalucía P11-TIC-752

idUS. Depósito de Investigación Universidad de Sevilla

Deleting or Keeping Outliers for Classifier Training?

Author: Riquelme Santos José Cristóbal
Tallón Ballesteros Antonio Javier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

This paper introduces two statistical outlier detection approaches by classes. Experiments on binary and multi-class classification problems reveal that the partial removal of outliers improves significantly one or two performance measures for C4.S and I-nearest neighbour classifiers. Also, a taxonomy of problems according to the amount of outliers is proposed.MICYT TIN2007- 68084-C02-02MICYT TIN2011-28956-C02-02Junta de Andalucía Pll-TIC-752

Crossref

idUS. Depósito de Investigación Universidad de Sevilla

Minería de Datos: Conceptos y Tendencias

Author: Gilbert Karina
Riquelme Santos José Cristóbal
Ruiz Roberto
Publication venue: IBERAMIA : Sociedad Iberoamericana de Inteligencia Artificial
Publication date: 01/01/2006
Field of study

Hoy en día, la minería de datos (MD) está consiguiendo cada vez más captar la atención de las empresas. Todavía es infrecuente oír frases como “deberíamos segmentar a nuestros clientes utilizando herramientas de MD”, “la MD incrementará la satisfacción del cliente”, o “la competencia está utilizando MD para ganar cuota de mercado”. Sin embargo, todo apunta a que más temprano que tarde la minería de datos será usada por la sociedad, al menos con el mismo peso que actualmente tiene la Estadística. Así que ¿qué es la minería de datos y qué beneficios aporta? ¿Cómo puede influir esta tecnología en la resolución de los problemas diarios de las empresas y la sociedad en general? ¿Qué tecnologías están detrás de la minería de datos? ¿Cuál es el ciclo de vida de un proyecto típico de minería de datos? En este artículo, se intentarán aclarar estas cuestiones mediante una introducción a la minería de datos: definición, ejemplificar problemas que se pueden resolver con minería de datos, las tareas de la minería de datos, técnicas usadas y finalmente retos y tendencias en minería de datos

Secretaría de Estado de Cultura

idUS. Depósito de Investigación Universidad de Sevilla

Improving the Evolutionary Coding for Machine Learning Tasks

Author: Aguilar Ruiz Jesús Salvador
Riquelme Santos José Cristóbal
Valle Sevillano Carmelo del
Publication venue: 'IOS Press'
Publication date: 01/01/2002
Field of study

The most influential factors in the quality of the solutions found by an evolutionary algorithm are a correct coding of the search space and an appropriate evaluation function of the potential solutions. The coding of the search space for the obtaining of decision rules is approached, i.e., the representation of the individuals of the genetic population. Two new methods for encoding discrete and continuous attributes are presented. Our “natural coding” uses one gene per attribute (continuous or discrete) leading to a reduction in the search space. Genetic operators for this approached natural coding are formally described and the reduction of the size of the search space is analysed for several databases from the UCI machine learning repository.Comisión Interministerial de Ciencia y Tecnología TIC1143–C03–0

idUS. Depósito de Investigación Universidad de Sevilla

Partitioning-Clustering Techniques Applied to the Electricity Price Time Series

Author: Martínez Álvarez Francisco
Riquelme Santos Jesús Manuel
Riquelme Santos José Cristóbal
Troncoso Lora Alicia
Publication venue
Publication date: 01/01/2007
Field of study

Clustering is used to generate groupings of data from a large dataset, with the intention of representing the behavior of a system as accurately as possible. In this sense, clustering is applied in this work to extract useful information from the electricity price time series. To be precise, two clustering techniques, K-means and Expectation Maximization, have been utilized for the analysis of the prices curve, demonstrating that the application of these techniques is effective so to split the whole year into different groups of days, according to their prices conduct. Later, this information will be used to predict the price in the short time period. The prices exhibited a remarkable resemblance among days embedded in a same season and can be split into two major kind of clusters: working days and festivities

idUS. Depósito de Investigación Universidad de Sevilla

Fast feature selection aimed at high-dimensional data via hybrid-sequential-ranked searches

Author: García Torres M.
Riquelme Santos José Cristóbal
Ruiz Roberto
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

We address the feature subset selection problem for classification tasks. We examine the performance of two hybrid strategies that directly search on a ranked list of features and compare them with two widely used algorithms, the fast correlation based filter (FCBF) and sequential forward selection (SFS). The pro-posed hybrid approaches provide the possibility of efficiently applying any subset evaluator, with a wrap-per model included, to large and high-dimensional domains. The experiments performed show that our two strategies are competitive and can select a small subset of features without degrading the classifica-tion error or the advantages of the strategies under study

idUS. Depósito de Investigación Universidad de Sevilla

Analysis of Measures of Quantitative Association Rules

Author: Martínez Ballesteros María del Mar
Riquelme Santos José Cristóbal
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2011
Field of study

This paper presents the analysis of relationships among different interestingness measures of quality of association rules as first step to select the best objectives in order to develop a multi-objective algorithm. For this purpose, the discovering of association rules is based on evolutionary techniques. Specifically, a genetic algorithm has been used in order to mine quantitative association rules and determine the intervals on the attributes without discretizing the data before. The algorithm has been applied in real-word climatological datasets based on Ozone and Earthquake data.Ministerio de Ciencia y Tecnología TIN2007-68084-C-00Junta de Andalucía P07-TIC-0261

idUS. Depósito de Investigación Universidad de Sevilla